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Abstract.  A parallel  (multiprocessor) 
system  processing  fault-tolerant  programs  was 
developed  in  [4,5].  The  system  performance 
is  evaluated  in  this  paper,  using  an  analytic 
approach  based  on  stochastic  models.  The 
analysis  confirms  the  high  effectiveness  of  a 
parallel  system,  under  all  practical  circum- 
stances, in  reducing  the  program  execution 
time  increase  due  to  run-time  validation  and 
system  state  saving.  It  also  shows  how  the 
system  performance  is  affected  by  various 
program  characteristics. 

1.  Introduction 

A system  architecture  for  parallel  exe- 
cution of  fault -tolerant  programs  (i.  e.  , pro- 
grams containing  redundancy  for  the  tolerance 
of  residual  program  errors  and/or  hardware 
faults  [7])  was  developed  in  [4,  5]  . The  system 
was  designed  to  execute  block-structured 
fault-tolerant  programs  developed  by  Horning 
et  al.  [3] . A fault -tolerant  block  or  recovery 
block  is  the  basic  component  containing  re- 
dundancy in  these  programs  and  has  the  fol- 
lowing structure;  ensure  T ^ Oj  else-by  O2 
else-by  . . . else-by  else-error,  where  T 
denotes  the  validation  test,  Oj  the  primary 
object  block,  and  Oj^  (l<ksn)  the  alternate 
object  blocks.  All  of  the  object  blocks  in  a 
fault-tolerant  block  F compute  the  same  or 
approximately  the  same  objective  function. 

The  validation  test  T is  executed  on  exit  from 
an  object  block  to  confirm  that  the  object 
block  has  performed  acceptably.  The  exe- 
cution of  a validation  test  results  in  either 
an  acceptance  (i.  e. , confirmation)  or  a re- 
jection. If  accepted,  control  exits  from  the 
fault -tolerant  block.  If  the  result  produced  by 
an  object  block  is  rejected,  the  next  alternate 
is  entered.  After  the  alternate  object  block 
finishes  its  computation,  the  validation  test  is 
repeated.  Before  an  alternate  object  block  is 
entered,  the  system  state  is  restored  to  the 
state  that  existed  just  before  entry  to  the  pri- 


mary object  block  [1,2,3].  To  enable  this, 
a state  vector  that  contains  the  values  of  all 
the  variables  (that  nriay  be  changed  by  the 
object  blocks)  is  saved  on  entry  to  a fault- 
tolerant  block. 


The  goal  of  the  parallel  execution  is  to 
overlap,  as  much  as  possible,  execution  of 
object  blocks  with  the  validation  and  system 
state  saving.  In  this  paper,  we  evaluate  the 
performance  of  the  parallel  system.  The 
approach  used  in  this  paper  for  performance 
evaluation  is  of  an  analytic  nature  and  is 
based  on  stochastic  models  for  both  the  parallel 
system  and  the  sequential  system  (i.  e. , one 
in  which  the  execution  of  an  object  block  is 
not  overlapped  with  the  execution  of  a validation 
test).  The  evaluation  shows  the  performance 
gain  by  parallel  execution  over  sequential 
execution.  - — 

In  the  next  section  major  characteristics 
of  both  an  efficient  sequential  system  and  a 
parallel  system  are  compared.  Section  3.  1 
deals  with  the  evaluation  of  the  sequential 
system.  Performance  of  the  parallel  system 
is  evaluated  in  Section  3.  2 and  compared  with 
the  performance  of  the  sequential  system  in 
Section  3.  3. 

2.  Distinguishing  Characteristics 
of  a Sequential  System  and  a Parallel  System 

In  this  section  two  systems,  a sequential 
system  using  a memory  organization  called  a 
recovery  cache  [1,3]  and  a parallel  system 
using  a duplex  -memory  [4,  5] , are  briefly 
sketched. 

The  essence  of  the  recovery  cache 
scheme  is  to  save  the  "original  value"  of  each 
non-local  variable  W together  with  its  logical 
address  right  before  the  variable  is  modified 
for  the  first  time  in  a new  object  block.  The 
original  values  are  thus  saved  in  a compact 


t This  work  was  supported  in  part  by  the  Joint  Service  Electronics  Program  under  Air  Force 
Contract  F44620-76-C-0061 . 


DISTRIBUnON  STATEMENT  A 


Approrsd  lor  public  ralscoM; 
DUWbntloD  Unlimited 


@ 


security  CiiiAfSIFlCATION  OF  THIS  PAGE  (When  Dele  Entered; 


9FEP0RT  DOCUMENTATION  PAGE 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


3.  RECIPIENT'S  CATALOG  NUMBER 


5.  TYPE  OF  report  A PERIOD  COVERED 


PERFORMANCE 


^VALUATION  OF  A PARALLEL  JJYSTEM 
iOCESSING  |AULT-JOLERANT  ^OGRAM^  J 


Intrrim 


6.  PERFORMING  ORG.  REPORT  NUMBER 


AUThOR('«; 


8.  CONTRACT  OR  GRANT  NUMBERfO 


^F4462^-76-C  ->^61  j 


9.  PERFORMING  organization  NAME  AND  ADDRESS 


Department  of  Electrical  Engineering 
and  Computer  Science 

(University  of  Southern  California. Los  Angeles. CA 


II.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 


AFOSR  /NE 

Bldg  410  Bolling  AFB  DC,  20332 


12.  REPORT  DATE 

Proc.  1977  Int'l  Conf.  of  Par* 


13.  NUMBER  OF  PAGES  lei  rroccssin 
10 


1«.  MONITORING  AGENCY  NAME  A AOORESSCtt  dt/Zerent  /rom  Conlrolllng  Ol/ice)  I IS.  SECURITY  CLASS,  (ol  thie  report) 


UNCLASSIFIED 


ISe.  declassification/ DOWNGRADING 
SCHEDULE 


IS.  DISTRIBUTION  ST ATEMEN T Cot  tftte  Report; 


Ap-rove-!  C'.' 
dintriiLcl.-. 


r :lc-s3 ; 


17.  DISTRIBUTION  STATEMENT  (of  the  ebstract  entered  in  Block  20,  If  different  from  Report) 


te.  SUPPLEMENTARY  NOTES 


BOOK  REPRINT: 

Proc.  1977  Int'l  Cof.  on  Parallel  Processing  p. 118-127 


19.  KEY  WORDS  (Continue  on  reverse  side  If  necessary  and  identify  by  block  number) 


ZOAVaBSTRACT  (Continue  on  reverse  side  ff  necessary  and  identify  by  block  number^ 


A parallel  (multiprocessor)  system  processing  fault-tolerant  programs  was„de- 
veloped,io  (I'l  The  system  performance  is  evaluated  in  this  paper,  using 

an  analytic  approach  based  on  stochastic  models.  The  analysis  confirms  the  hi] 
effectiveness  of  a parallel  system,  under  all  practical  circumstances,  in  re- 
ducing the  program  execution  time  increase  due  to  run-time  validation  and 
system  state  saving.  It  also  shows  how  the  system  performance  is  affected  by 
various  program  characteristics. 


DD 


EDITION  OF  I NOV  65  IS  OBSOLETE 


UNCLASSIFIED 


0 7 


security  CLASSIFICATION  OF  THIS  PAGE  (When  Dele  Entered) 


t«ble  atructure.  For  illustration,  the  fault- 
tolerant  program  in  Figure  la  is  used. 

Figure  lb  shows  a snapshot  of  the  re- 
covery cache  taken  when  primary  object  block 
O2  ] is  in  execution.  As  shown,  there  is  a 
stack,  called  the  cache  stack,  used  for  saving 
the  original  values.  Similar  to  the  main 
stack,  the  cache  stack  is  also  divided  into 
regions,  one  region  for  each  nested  fault- 
tolerant  block  in  the  "active”  state  (i.  e. , a 
fault-tolerant  block  that  has  been  entered  but 
not  exited).  The  top  region  of  the  cache  stack 
in  Figure  lb  contains  previous  values  of  non- 
local variables  together  with  their  names  (re- 
presenting logical  addresses),  i.  e. , Y2,  XI, 

X2,  which  have  been  modified  during  execution 
of  the  current  object  block  O2  x'  Similarly, 
the  bottom  region  of  the  cache  stack  contains 
the  previous  value  of  non-local  variable  XI 
which  had  been  modified  by  execution  of  object 
block  Oj  j before  O2  j was  entered.  Figure 
lb  also  shows  a flag  field  in  the  main  stack. 
The  flag  attached  to  a variable  indicates 
whether  the  original  value  of  the  variable  has 
already  been  saved  since  the  current  object 
block  was  entered.  Thus  the  flags  attached  to 
Y2,  XI,  X2  in  the  main  stack  are  currently  set. 

If  the  result  produced  by  execution  of 
O2  J fails  the  validation  test  V^,  then  the  top 
region  of  fhe  cache  stack  can  be  used  to 
reset  the  main  stack  to  the  state  that  existed 
on  entry  to  fault-tolerant  block  F2 . If  it 
passes  the  test,  execution  of  F^  is  complete 
and  is  merged  into  Cj  so  that  the  result 
will  contain  previous  values  of  those  variables 
which  are  non-local  to  Oj  j and  have  been 
modified  since  Oj  j was  entered.  Thus  the 
result  will  be  a single  region  containing  (XI,  9) 
and  (X2,  2),  Flags  in  the  main  stack  are  also 
adjusted  such  that  only  flags  of  XI  and  X2  are 
set.  Therefore,  the  combination  of  the  main 
and  cache  stacks  usually  contains  information 
with  which  several  old  state  vectors  can  be 
reconstructed. 

In  the  case  of  parallel  execution  at  least 
two  processors  are  used,  a main  processor 
for  object  block  execution  and  a VR-(validation 
and  recovery)  processor  or  audit  processor 
for  execution  related  to  validation  and  recovery. 
It  is  necessary  to  save  a state  vector  on  exit 
from  an  object  block  since  the  state  vector  is 
used  by  both  the  main  processor  and  the  VR- 
processor.  This  is  accomplished  by  simul- 
taneously storing  the  operand  of  each  WRITE 
operation  into  two  locations,  one  in  the  main 
stack  and  the  other  in  the  VR-store.  When 
the  main  processor  enters  a fault-tolerant 
block  F,  a VR-store-segment  is  created  to 
keep  an  execution  image  which  consists  of 


records  of  assignments  made  by  an  object 
block  in  F.  A VR-store-segment  consists  of 
two  sections,  the  Li-(local  variable)  section 
for  keeping  records  of  assignments  to  variables 
local  to  the  object  block  in  execution  and  the 
N-(non-local  variable)  section  for  assignment 
records  of  non-local  variables.  A variable 
local  to  the  object  block  being  entered  is 
allocated  one  location  in  the  main  stack  and 
one  location  in  the  L-section  of  a VR-store- 
segment.  New  values  assigned  to  variables 
that  are  non-local  to  the  object  block  in  exe- 
cution are  recorded  together  with  the  logical 
addresses  (of  the  variables)  in  a table  struc- 
ture in  the  N-section  of  a VR-store-segment. 


For  illustration.  Figure  Ic  shows  the 
content  of  the  VR-store  at  an  instant  during 
execution  of  the  program  in  Figure  la  by  a 
parallel  system  using  a duplex  memory. 

When  the  main  processor  entered  the  program 
(i.  e. , the  outermost  block),  VR-store-segment 
Sq  was  created  to  keep  assignment  records 
of  local  variables  XI  and  X2.  Since  there  are 
no  variables  non-local  to  the  outermost  block, 

Sq  does  not  contain  a N-section.  When  the 
main  processor  entered  Fj,  VR-store-segment 
Sj  was  created.  When  non-local  variable  XI 
was  assigned  the  value  "8"  during  execution  of 
object  block  Oj.  j,  a table  entry  (XI,  8)  was 
made  in  Sjj^.  Similarly,  S2  was  created  when 
the  main  processor  entered  F2  snd  was  filled 
by  execution  of  object  block  j.  The  content, 
of  the  main  stack  in  a dupiex  memory  is  that 
in  a recovery  cache  minus  the  flag  field. 

On  completion  of  O2  x«  main  pro- 
cessor proceeds  to  the  execution  of  Fj  (which 
will  be  imaged  in  a new  VR-store-segment  S3) 
while  the  VR-processor  starts  examining  the 
execution  image  in  S2  by  execution  of  V2.  If 
the  result  produced  by  execution  of  O2,  x 
in  S2)  fails  the  validation  test  V2.  then  the 
non-local  variables  recorded  in  S2xv}  (and  S3x.x  • 
if  not  empty)  are  those  which  need  to  be  reset. 
Segments  Sg  and  Sj  contain  the  values  of  the 
variables  that  existed  when  the  main  processor 
entered  fault-tolerant  block  F2  end  their  values 
may  be  used  to  reset  the  main  stack.  A 
duplex  memory  may  be  implemented  such  that 


the  previous  value  can  be  obtained  in  a single 
content-addressable  memory  (CAM)  cycle  [4,  5] . 
If  the  result  of  O2  x passes  V2,  S2L  dis- 
carded and  S2fx  is  merged  into  Sx  so  that  the  “ 
result  contains  the  assignment  records,  of  the— 
variables  addressable  in  Ox  1.  made  since 
Ox  X entered.  This  will  result  in  S|^  S 

containing  and  "3"  for  Yl,  Y2,  Y3, 

respectively  and  Sjj^  containing  (XI,  7)  and 
(X2,  8).  1 
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Let  us  now  compare  the  characteristics 
of  the  recovery  cache  scheme  for  sequential 
execution  with  the  characteristics  of  the  duplex 
memory  scheme  for  parallel  execution. 

1.  In  both  schemes,  content-addressable 
memory  modules  are  needed  to  obtain  an 
acceptable  level  of  performance  in  program 
execution  and  in  the  rest  of  this  paper,  the 
use  of  CAM  modules  is  assumed. 

2.  The  duplex  memory  takes  more  space 
than  the  recovery  cache. 

3.  The  WRITE  operation  into  a non-local 
variable  W involves  two  steps  with  the  recovery 
cache,  the  first  step  being  used  for  fetching 
the  original  value  or  the  flag,  while  the  WRITE 
operation  takes  one  step  (CAM  cycle)  with  the 
duplex  memory.  Therefore,  the  execution  of 
an  object  block  is  slower  with  the  recovery 
cache  than  with  the  duplex  memory. 

4.  Overall,  it  is  expected  that  the  re- 
covery cache  takes  less  merging  time  than  the 
duplex  memory.  During  the  execution  of  a 
program  in  which  no  fault-tolerant  block  is 
nested  within  another  fault -tolerant  block,  there 
is  no  merging  involved  with  the  recovery  cache. 

5.  The  parallel  system  is  slower  in  re- 
covery because  (a)  recovery  of  a variable  takes 
more  steps  with  the  duplex  memory  than  with 
the  recovery  cache  and  (b)  there  are  more 
variables  that  need  to  be  recovered  in  the 
parallel  system  because  while  an  execution 
image  is  being  validated,  the  main  processor 
normally  proceeds  to  the  successor  block(s). 

In  summary,  the  parallel  system  largely 
trades  recovery  time  increase  for  the  reduction 
of  total  program  execution  time.  There  are 
cases,  though  highly  impractical,  where  the 
performance  of  the  parallel  system  is  inferior 
to  the  performance  of  the  sequential  system. 

Let  or  denote  the  reliability  of  an  object  block, 
i.  e. , the  probability  of  an  average  object  block 
producing  an  accepted  execution  image.  Then 
there  is  a lower  bound  * such  that  when 

. the  parallel  system  performs  more 
efficiently  than  the  sequential  system.  This 
lower  bound  is  one  of  the  values  of  interest 
examined  in  subsequent  sections. 

3.  Performance  Evaluation 

Given  a fault-tolerant  program,  the  aver- 
age execution  time  of  a fault-tolerant  block  is 
defined  as  the  execution  time  of  the  program 
divided  by  the  number  of  fault -tolerant  blocks 
executed  during  the  program  execution.  T,  and 
T denote  the  average  execution  time  of  a fault- 
tolerant  block  by  the  sequential  system  and  by 


the  parallel  system,  respectively.  The  system 
throughput  is  defined  as  the  number  of  fault- 
tolerant  blocks  completed  per  unit  time  and  is 
given  by  the  inverse  of  the  average  execution 
time  of  a fault-tolerant  block.  We  denote  the 
sequential  system  throughput  and  the  parallel 
system  throughput  by  THRg  and  THR- , res- 
pectively. Throughputs  are  used  in  this  section 
as  measures  of  the  performance  of  the  se- 
quential system  and  of  the  parallel  system. 

For  mathematical  tractibillty,  the  following 
set  of  global  assumptions  have  been  adopted 
throughout  the  performance  evaluation. 

Assumption  G 

G.l  The  programs  considered  in  this  analysis 
are  of  the  type  in  which  no  fault-tolerant  block 
is  nested  within  another  fault-tolerant  block  and 
whose  execution  becomes  a sequential  chain  of 
fault-tolerant  block  executions  (Figure  2). 

G.2  Primary  and  alternate  object  blocks  take 
the  same  average  execution  time. 

G.  3 Each  fault-tolerant  block  contains  an  un- 
limited number  of  alternate  object  blocks  (to 
eliminate  the  case  of  program  failure). 

In  executing  a program  satisfying  assump- 
tion G.  1,  the  sequential  system  does  not  involve 
assignment  record  merging,  as  mentioned  in 
Section  2.  This  assumption  G.l  is  adopted 
because  of  the  difficulties  in  (1)  dealing  with-a 
large  spectrum  of  legitimate  program  structures, 
(2)  keeping  accounts  of  various  execution  times 
during  execution  of  a general  program  (i.e., 
a program  in  which  fault -tolerant  blocks  are 
nested  one  within  another),  etc.  However,  it 
is  conjectured  that  results  in  this  paper  of 
performance  comparison  between  two  systems 
for  programs  satisfying  G.  1 will  not  be  far 
different  from  the  results  for  general  programs. 

3.  1 Throughput  Evaluation  for  the  Sequential 

System 

The  behavior  of  the  sequential  system 
during  execution  of  a fault-tolerant  block  is 
depicted  in  Figure  3a.  The  system  first  enters 
the  "object  block  execution"  state  s^  in  which 
the  processor  executes  an  object  block  within 
the  current  fault -tolerant  block.  On  completion 
of  an  object  block,  the  system  enters  the 
"validation"  state  s^  in  which  the  processor 
executes  the  validation  test.  If  the  validation 
results  in  a rejection,  the  system  enters  the 
"recovery"  state  s^.,  and  on  completion  of  the 
recovery,  the  system  again  enters  Sq  in  which 
the  processor  executes  an  alternate  object  block. 
If  the  validation  results  in  an  acceptance,  the 
system  proceeds  to  the  execution  of  the  succes- 
sor fault -tolerant  block  and  repeats  the  above 
behavior. 
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During  execution  of  fault-tolerant  programs 
satisfying  assumption  G,  the  sequential  system 
continuously  repeats  the  process  depicted  in 
Figure  3a.  We  thus  model  the  system  behavior 
by  the  following  stochastic  process  for  the  pur- 
pose of  evaluating  THR^  . 

Model  S 

S.  1 There  are  three  states  which  the  sequen- 
tial system  may  enter;  s^  - object  block  exe- 
cution, s^.  - validation,  and  Sj.  - recovery. 

(Due  to  assumption  G.  1 there  is  no  merging 
state.  ) 

S.  2 The  time  during  which  the  system  is  in 
any  state  is  exponentially  distributed. 

S.  2.  1 When  the  system  is  in  state  s^  , the 
rate  gs  of  generating  an  execution  image  (i.  e.  , 
the  probability  of  the  system  completing  the 
execution  of  an  object  block  within  an  infinite- 
simal time  interval  it  is  gs'*it),  is  gs  = l/t 
where  t^^^  denotes  the  mean  object  block  exe- 
cution time  in  the  sequential  system.  gs  is 
called  the  generation  rate. 

S.  2.  2 When  the  system  is  in  state  s^ , the 
rate  v of  completing  the  validation,  called  the 
validation  rate,  is  v=  l/t^  where  t^  denotes  the 
mean  validation  time. 

S.  2.  3 When  the  system  is  in  state  3^,  the  rate 
rs  of  completing  the  recovery,  called  the  re- 
covery  rate,  is  rs  = 1 /t^g  where  t^g  denotes 
th*  mean  recovery  time  in  the  sequential  system. 

S.  3 The  probability  of  the  system  entering 
state  Sg  after  leaving  state  s^  is  a,  while  the 
probability  of  entering  state  s^  is  ar'=  1 - a. 

Figure  3b  depicts  Model  S.  Let  p^  , , 

Pj,  denote  the  equilibrium  probabilities  [6]  of 
the  system  being  in  s^ , s^,  s^ , respectively. 

The  steady- state  behavior  of  the  system  is 
expressed  by  the  following  equilibrium  equations, 

Pj,  • g»  = Pj,  • rs  + p^  • v • or 

Py*  = Po*  «•  < • ) 

Pg  + Py  + P^  = 1 (normalising  equation). 

Solving  Eq.  I,  we  obtain 

Pq  ■ • V • a'  + rs  • V + gs  • rs) 

p^=  rs  • gs/ (gs  • v • a'  + rs»  v + gs  • rs)  ( 2 ) 

Pj.=  gs*v.a'/(gs*va'+  rs*v  + gs-rs). 

By  definition  system  throughput  is  equal  to  the 
number  of  execution  images  accepted  per  unit 
time.  Throughput  THR,  and  its  inverse  T,  can 
thus  be  obtained  as  follows. 


THR  = p • V • a 
s 

= rs  •g8-v.o/(gs  ^ rs-v  t gs  - rs) 

( ia  I 

T^  = 1/THRg 

= (gs  • V t rs  • V + gs  • rs)/(rs  . gs 
= (l/a)-(t„,+ t ) + (a'/'>)-t  . ( io  ) 

03  V r 3 

3.  2 Throughput  Evaluation  for  the  Paralit  l 

Syst  em 

In  most  cases  the  main  proiessor  need 
not  be  synchronized  witli  the  VR-processor. 
However,  when  the  next  fault -tolerant  blor.l-  to 
be  executed  specifies  irreversible  crticrs  <■./ 
critical  nature,  the  main  processor  waits  until 
the  VR-processor  accepts  all  the  e-'tecution 
images  in  the  queue  (i.  e.  , the  execution  images 
of  the  predecessor  fault-tolerant  blocks)  (4,  5!. 
An  execution  image  generated  immediately  be- 
fore a block  specifying  an  irreversible  action 
is  entered,  is  a "synchronizing"  execution 
image  (or  for  short,  S-image).  The  other 
execution  images  are  "norntal"  execution  images 
(or  N-images). 

An  abstract  representation  of  the  parallel 
system  with  unbounded  queue  is  shown  in 
Figure  4.  The  main  processor  continuously 
constructs  execution  images  and  puts  the  com- 
pleted execution  images  into  the  queue  of  exe- 
cution images  except  when  (1)  the  VR-processor 
stops  it  on  rejection  of  an  execution  image  and 
enters  the  recovery  state,  or  (2)  the  main  pro- 
cessor has  generated  a synchronizing  execution 
image  and  put  it  into  the  queue.  The  VR- 
processor  validates  execution  images  in  the 
order  of  their  arrival.  When  it  accepts  an 
execution  image,  it  enters  the  "merging"  state. 
On  completion  of  merging,  it  checks  if  another 
execution  image  is  waiting  in  the  queue.  If  an 
execution  image  is  rejected,  the  main  processor 
is  stopped  and  recovery  is  initiated.  Recovery 
involves  a sequence  of  assignment  reversals 
using  the  assignment  records  in  the  execution 
images  and  thus  can  be  thought  of  as  a process 
of  "erasing"  the  execution  images  in  the  queue. 
On  completion  of  the  recovery,  the  queue  is 
empty  and  the  main  processor  is  restarted. 

The  parallel  system  is  thus  modeled  by  the 
following  stochastic  process. 

Model  P 

P.  1 The  state  of  the  system  at  any  Instant  is 
characterized  by  (1)  the  state  of  the  VR-pro- 
cessor which  may  be  in  wait,  validation, 
merging  or  recovery,  and  (2)  the  number  and 
types  of  execution  images  in  the  queue.  The 
state  of  the  main  processor  is  busy  or  waiting 


121 


and  is  determined  by  the  state  of  the  VR-pro- 
cessor  and  the  state  of  the  queue.  Thus  each 
system  state  is  denoted  by 
s 

VR'processor  state,  queue  state 
where  (1)  VR-processor  state  = w (wait),  v 
(validation),  m (merging),  or  r (recovery),  and 
(2)  queue  state  = 0 (empty),  N (one  normal 
execution  image),  S (one  synchronizing  execution 
image),  $ (=  N or  S),  NN,  NS,  $N,  $S,  NNN, 

NNS,  $NN,  $NS 

Some  possible  states  of  the  system  are 
shown  in  Figure  5,  where  some  possible  state 
transitions  are  also  indicated.  For  example, 

Sv  jyf  is  the  state  where  the  queue  contains  one 
normal  execution  image  which  the  VR-processor 
is  validating.  There  are  four  states  which  the 
system  may  enter  from  s^  : s^  which  is 
entered  if  the  main  processor  generates  another 
normal  execution  image;  s^  which  is  entered 
if  the  main  processor  generates  a synchronizing 
execution  image;  s^^  ^ which  is  entered  if  the 
VR-processor  accepts  the  normal  execution 
image  in  the  queue;  and  Sj.^  jq  which  is  entered 
if  the  VR-processor  rejects  the  normal  exe- 
cution image  in  the  queue.  In  s^  ^ the  system 
erases  the  normal  execution  image  in  the  queue 
and  thereafter  enters  state  s^  0 in  which  the 
system  erases  the  partially  constructed  exe- 
cution image  contained  within  the  main  proces- 
sor. Note  that  the  type  of  the  first  image  in 
the  queue  is  not  distinguished  in  some  states 
(e.  g. , 1^).  This  is  because  once  an 

execution  image  is  accepted,  the  system's 
future  behavior  is  independent  of  the  type  of 
the  execution  image  just  accepted. 


P.2  The  time  during  which  either  processor 
is  in  a particular  state  is  exponentially  dis- 
tributed. 

P.  2.  1 When  the  main  processor  is  in  a busy 
state,  the  generation  rate  gp  is  gp=l/tgp, 
where  tgp  represents  the  mean  object  block 
execution  time  (which  is  different  from  tg,). 

P.2.2  When  the  VR-processor  is  in  a valida- 
tion state,  the  validation  rate  v is  v=l/ty, 
where  t^  represents  the  mean  validation  time. 

P.  2.  3 When  the  VR-processor  is  in  a merging 
state,  the  rate  mp  of  completing  the  merging, 
called  the  merging  rate,  is  mp=l/t^p  where 
tjyjp  represents  the  mean  merging  time. 

P.2.4  When  the  system  is  in  a recovery  state 
other  than  s^  g,  the  rate  rp  of  erasing  an 
execution  image,  called  the  recovery  rate,  is 
rps  1/tfp  where  t^p  represents  the  mean  time 
for  erasing  an  execution  image. 


P.  2.  5 The  size  of  the  partially  constructed 
execution  image  remaining  within  the  main 
processor  when  the  system  enters  a recovery 
state  is  assumed  to  be  proportional  to  the 
amount  of  time  that  the  main  processor  has 
spent  in  construction  of  that  execution  image. 
Borrowing  a result  in  the  renewal  theory,  the 
mean  size  of  the  execution  image  partially  con- 
structed (when  the  system  enters  a recovery 
state  from  a state  where  the  main  processor  is 
busy),  is  the  same  as  the  mean  size  of  a com- 
pleted execution  image  [6]  . Thus  when  the 
system  is  in  s^  0 , the  rate  of  moving  from 
*r,  0 *°  ®w,  0 is’ also  rp. 

P.  3 The  probability  of  a validation  resulting 
in  an  acceptance  is  a as  before,  while  the 
probability  of  a rejection  is  or'=  1-a. 

P.  4 The  probability  of  a newly  generated 
execution  image  being  an  N-image  is  r|,  while 
that  for  being  an  S-image  is  = 1 - . 

Figure  5 depicts  Model  P.  It  also  shows 
the  notation  for  the  equilibrium  probability  of 
the  system  being  in  each  state  s^^j.  The  pro- 
babilities are  denoted  by  1 (for  s^  0),  J (for 

»m,$)«  *k*yk»*k-*k'“0  *r,0j«'ik'  <lk ' 

where  k=l,2, ...  except  that  there  does  not 
exist  yj  nor  xj.  The  subscript  k indicates  the 
number  of  execution  images  present  in  the  queue 
The  steady-state  behavior  of  the  system  is  then 
expressed  by  the  following  equilibrium  equations. 

(a)  I ' gp  = J • mp  + Uq  • rp  + q^  • rp 

(b)  J*  (gp  + mp)  = (zj  + Wj)  • V • a 

(c)  Sj- (gP  + v)  = I.gp.q+y^'mp 

(d)  (gP  + v)  = Zj^  j*  gp.T\  + yj^^l"mp  for  k=2,  3, . . 

(e)  y2'(gP  + tnp)  = J-gp-n  + z^'V-a 

(f)  yj^-(gP  + nip)  = yj^j-gp-Ti+Zj^-vo  for 

k = 3,4, ..  . 

(g)  X2‘mp  = J*gp-T)' + w^- va 

(h)  mp  = yj^  j*  gp  - T)'  + Wj^-  V for  k=3,4, . . . 

(i)  Wj- V = I • gp-T)' + X2- mp 

(j)  V = Zj^_j-gp-Ti'  + Xj^^j-mp  for  k=2,  3, . . . 

(k) 

(l)  »>j^Tp  = Zj^- vflf'  + Uj^^j.rp  for  k=  1. 2, . . . 

(m)  %' = ‘•'ll’ fork  =1,2,... 

(n)  I + J + Ug  + (Z|^+  yj^+  Wj^+  Uj^+  q^^)  = 1 

(normalizing  equation)  ( 4 ) 

Solving  this  system  of  equations,  we  can  obtain 
the  quilibrium  probabilities.  This  system  can 
be  solved  in  closed  form,  but  the  solution  pro- 


cedure  is  not  described  here.  Since  the  sys. 
tern  throughput  THRp  was  defined  as  the  num- 
ber of  acceptances  made  per  unit  time,  THRp 
and  Tp  can  be  obtained  by 


THR  = 
P 


vet 


+ 


T = 
P 


( 5 ) 


Another  measure  of  interest  is  the  expected 
queue-length  E(QL). 


E(Q1.)  = J+  S +9]^)) 

k=l 


where  y^  = Xj  = 0 . 


( 6 ) 


Figure  6 depicts  the  expected  queue- 
length  £(QL)  for  various  values  of  “,T|,tv/tQp, 
*mp^*op»*rp/*mp*  ^ examining  Figures  6 and 
7 we  are  mostly  interested  in  the  cases  where 
a is  greater  than  0.9.  Since  fault -tolerant 
programs  dealt  with  here  are  supposed  to  have 
undergone  a testing  phase  before  being  put  into 
operation,  one  or  more  erroneous  object  blocks 
out  of  ten  seems  highly  improbable.  On  the 
other  hand,  is  application-dependent  and  may 
not  be  very  close  to  1.  For  example,  t|  = 0.999 
implies  that  only  one  among  1000  execution 
images  generated  is  an  S-image.  In  this  eva- 
luation, T)  is  set  mostly  within  the  range  of 
0.9-0.95  and  the  most  frequently  used  values 
are  0.9  for  ti  and  0.95  for  a.  The  following 
practical  constraints  were  also  adopted. 


t 

V 


< 


t 

op 


t < t 
mp  op 


< t /t  S 
rp  mp 


1.5 


( 7 ) 


As  expected,  E(QL)  becomes  larger  as  a 
or  T)  increases.  Furthermore,  comparison  of 
curve  3 in  Figure  6a  (which  is  a result  of 
changing  a when  t|  = 0.95)  with  curve  Z’  (a  result 
of  changing  r|  when  a = 0.95)  indicates  that 
E(QL)  is  more  sensitive  to  the  change  of  ri  than 
to  the  change  of  or.  This  is  also  shown  by  a 
comparison  of  curve  2 (a  result  of  changji^g  y 
when  Tl  = 0.  9)  with  curve  l'  (a  result  of  changing 
T|  when  a =0.9).  Figure  6b  shows  that  E(Ql-) 
increases  as  mean  validation  time  t^  or  mean 
merging  time  t^p  increases.  When  tv  + tjyy,< 
tgp , E(QD  is  generally  smaller  than  5.  The 
data  obtained  but  not  plotted  in  Figure  6 in- 
dicated that  mean  recovery  time  tj.p  affects 
E(QL)  to  a negligible  extent.  This  is  because 
(1)  when  a is  large,  the  system  rarely  enters 


a recovery  state  and  (2)  when  a is  small,  the 
system  rarely  enters  a state  where  the  queue- 
length  is  large. 


3.  3 Performance  Comparison  Between  the 

Seiuential  System  and  the  Parallel  System 

A simple  way  of  assessing  the  perform- 
ance of  the  parallel  system  is  to  compare  the 
throughput  THRp  with  the  throughput  THR,  of 
the  sequential  system.  THRp/THRg  is  then 
the  throughput  ratio  and  is  a function  of  ^ ,r\, 

*v/*op'  *mp^^op’  *rp/*mp'  ^os^^op'  *rp/*rs  • 

Here  t^g/top  represents  the  object  block  exe- 
cution time  ratio  while  t^p/t^g  represents  the 
recovery  time  ratio.  These  parameters  are 
within  the  following  ranges  (cf.  Section  2 or 
[5]  for  more  details). 


1 < t /t  « 2 
os  op 


1 < t /t  <1.5 
rp  ra 


( 8 ) 


Figure  7 depicts  the  throughput  ratio  for 
various  values  of  parameters  subject  to  the 
constraints  in  Eqs.  (7)  and  (8).  First,  Figure 
7a  discloses  that  variation  of  recovery  time 
ratio  tj.p/tj.g  within  a practical  range  has 
little  effect  on  the  throughput  ratio.  This  is 
again  because  (1)  when  a is  large,  the  system 
rarely  gets  into  a recovery  state,  and  (2)  when 
9 is  small,  E(Ql.,)  becomes  small  and  thus  a 
recovery  involves  mostly  a small  number  of 
execution  images.  Figure  7b  indicates  that 
the  throughput  ratio  is  not  much  affected  by  the 
change  of  frp^^mp  ° within  a practical 
range,  while  it  is  significantly  affected  by 
object  block  execution  time  ratio  tog/t^p. 

Object  block  execution  time  ratio  t^g/top,  re- 
covery time  ratio  t^p/t^g  and  typ/t^^p  are 
machine  characteristics  while  other  parameters 
represent  program  characteristics. 

Figure  7c  shows  that  the  throughput  ratio 
decreases  as  merging  time  t^p  (more  precisely 
tmp/top)  The  obvious  reason  is 

because  under  assumption  G.  1 merging  is  in- 
volved only  in  parallel  execution.  It  also  shows 
that  increase  of  ty  causes  a throughput  ratio 
increase  approximately  until  tytt^p  surpasses 
tgp  but  further  increase  of  ty  does  not  change 
(actually  slightly  decreases)  the  throughput  ratio. 
This  can  be  explained  as  follows.  As  tytt^p 
becomes  larger  than  tgp , E(QL)  becomes  large 
and  thus,  each  time  a synchronizing  execution 
image  is  generated,  the  queue  contains  a large 
number  of  execution  images.  The  validation 
and  merging  of  these  are  not  overlapped  with 
object  block  execution.  Figure  7d  confirms 
the  expectation  that  as  -n  increases,  the  through- 
put ratio  increases. 
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In  summary.  (1)  for  a practical  a,  the 
performance  improvement  by  parallel  execution 
is  must  sensitive  to  object  block  execution  time 
ratio  tos/top  and  tn^p/top.  less  sensitive  to 
tv/top  and  the  least  sensitive  to  typ/tmp  and 
recovery  time  ratio  typ/ti.3  , and  (2)  the 
throughput  ratio  ranged  over  i . 02  - 1 . h5  (or 
2-65%  gain)  for  ^=0.95  and  for  the  values  of 
other  pararo.aters  plotted  in  Figure  7. 

Figure  7a  also  displays  the  existence  of 
OL  (defined  in  Section  2 as  the  lower  bound  of 
a to  make  the  performance  of  the  parallel 
system  superior  to  that  of  the  sequential  sys- 
tem). The  data  obtained  but  not  fully  plotted 
in  Figure  7 showed  that  in  all  the  cases  de- 
picted in  Figure  7.  0^  exceed  0.87  and 

rarely  went  above  0.6.  It  can  conservatively 
be  said  that  the  practical  range  of  or  is  far 
above  . 

4.  Summary 

The  analysis  made  in  this  paper  confirmed 
that  parallel  execution  can  reduce  the  execution 
time  increase  inherent  in  fault-tolerant  pro- 
grams. The  analysis  demonstrated  largely  two 
points.  First,  under  all  practical  circumstances 
the  parallel  system  showed  good  performance. 
The  performance  was  particularly  good  when  a 
was  above  0.9  or  0.95.  It  is  believed  that  or 
would  always  be  in  such  a range  for  programs 
which  have  undergone  a reasonable  degree  of 
testing  before  being  put  into  operation.  Second, 
it  showed  how  the  effectiveness  of  parallel 
execution  was  affected  by  various  program 
characteristics.  Although  no  real  statistics  on 
various  program  characteristics  are  available, 
it  is  believ^  that  our  examination  covered  a 
broad  range  of  reasonable  values  for  each 
parameter.  Availability  of  a parallel  system 
may  influence  the  program  characteristics  to 
some  extent. 

In  short,  the  parallel  execution  approach 
allows  the  incorporation  of  extensive  validation 
and  recovery  facilities  without  associated  ex- 
pensive execution  time  overhead.  The  price 
paid  is  the  increased  hardware  requirement. 
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Figure  lb.  Recovery  cache  during  execution 
of  la. 


Figure  la.  A block-structured  fault -tolerant 
program. 
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Figure  Ic.  VR-store 
during  ercecution  of  la. 


Figure  Z.  Execution  of  a 
fault-tolerant  program  of 
the  type  assumed. 
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Figure  3a.  Behavior  Figure  3b.  Model  S. 
of  the  sequential 
system  during  execution. 
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